Welcome
to the 2023 #DegreesNYC R Course! As you know, #DegreesNYC is on a
mission to radically transform the NYC public school system at all
levels. Collective impact and youth engagement are pillars of
#DegreesNYC’s mission and are the primary tools that the organization
uses to further its goals. The folks at #DegreesNYC, however, are aware
that the policy space in which they exist values quantitative evidence
highly. This course is meant to democratize quantitative expertise, by
giving you an introduction to some of the skills you will need to find,
manipulate, and analyze data using R. Our hope is that this will not
only be enriching for you personally, but will better position you to
make policy arguments in the future.
During the course you are going to work primarily in R and much of the work done using R will be done with R Markdown. Don’t worry if you are not familiar with the language or with coding at all; this class and the Codecademy course that will supplement it are meant for beginners. If you do have experience with R and feel that the course is moving too slowly or that you are not being sufficiently challenged here are some options: 1) Add complexity to the examples and challenges presented in class or on Codecademy, 2) read ahead in the syllabus and work on something that is at your level, or 3) focus extra time on your project. Though there are many other languages used for data analysis (including Python, SAS, Stata, and SQL), you will not explore those. If you’d like to know more about one of them feel free to reach out to me and I am happy to share what I know or help point you to a starting place if I am inexperienced in the language myself.
Feel free to email me at any time at regg19970@gmail.com.
Reggie Gilliard came to data analytics and R through an unusual path. He started his career as a teacher in private schools, first at San Francisco Day school and then at Collegiate. Education is now and has always been important to him, but feeling that his impact could be greater outside of the classroom he set off to pursue a master’s degree in education policy.
That degree led Reggie to the Research Alliance for New York City Schools, where his relationship with #DegreesNYC began. That position led him to a PhD at Teachers College, Columbia University in the Economics and Education program. Currently, he is a Data Analytics Developer at Mathematica. He has experience using SAS, SQL, Stata, R, and Python (as well as tools like Excel), but is most well versed in R and Stata.
Outside of the work he does he enjoys reading; playing sports, video games, and board games; trying new restaurants, and generally spending time with his friends and family.
The course will be split into two units. Unit 1 - Learning and Unit 2 - Project.
During the Learning Unit you will be focused on working through lessons. Each session will be split into two parts: 1) a fifteen minute mini-lesson delivered by me or another member of the #DegreesNYC team, 2) a Codecademy lesson. The mini-lessons are designed to align as closely as possible with the Codecademy lesson for the week to provide as much reinforcement and clarification on the topics that you will cover as possible.
Unit 2 will be dedicated to a complete analysis of a data set of your choice. For the final six weeks of the course, finishing this project will be the goal. We may still cover some topics briefly to give you additional tools which are beyond the scope of the lessons, but are useful for creating an appealing report, however most of the time will be spent in breakout groups working on your projects. All projects will be undertaken individually, but teams of two will be permitted if there is a compelling reason.
To complete the project you will need to take your data set through the following stages:
Research Question Creation
Data Identification
Identify a data set that can be used to address the question that you are interested in. If you have access to a data set that you want to examine, you are encouraged to do that (Note: think carefully about privacy concerns. If you are the only one who should be seeing the data, you may want to create a mock data set that your classmates and I can view so that we can support your project work). If you don’t have access to a data set there are many places that you can go to find a data set to use:
NYC Open Data - New York City focused data sets.
Data.gov - U.S. data sets.
IPEDS - Postsecondary education data sets.
NCES - National (or state specific) school data sets.
Kaggle - Public data sets on a wide variety of topics (account needed).
Data cleaning
Use the tools you have learned in the class to clean the data that you’ve downloaded. Some considerations:
Data Analysis
Data Visualization
ggplot2 is the package that we will
use for visualization in this class and the possibilities are broad. The
ggplot2 gallery shows the
wide range of graphs that you can create using ggplot2 and
related packages, and even provides code to use to get started.I encourage you to use the time in break out rooms collaboratively: share your screens, share code, ask each other questions, and learn from one another. Use your classmates and me as a resource to create something you feel is worthwhile. Although I suggest doing the project alone, this is to maximize learning, not to prevent “cheating”. Any and all collaboration is welcome in this course.
This syllabus will be the repository for everything course-related. Information about lessons, the project, and enrichment is all contained here. The table of contents in the top left has links to the content for Units 1 and 2, the introduction, and enrichment activities. Within the Unit 1 header there are tabs for each of the 6 lessons that you will work through during our time together. These lessons each also have tabs for the topics you will cover, the lesson itself, and any helpful links that you might want to explore. This document will be hosted publicly on GitHub, but it is a living document. If anything changes or if there are problems with the URL leading to the syllabus, I will let you know as soon as possible.
R is one of the
most common languages for data science and data analysis. This is
because, with experience, one can take a project from beginning to end
only using R. You can download data; manipulate it by creating new
variables, dropping missing observations, etc.; perform statistical
analysis; and visualize and output results. R, like Stata and SAS, has
built-in statistical analysis functionality, but unlike those two
languages it is 100% free and open source. There is are a wide range of
functions in base R, but the R community is very strong and has built a
host of tools that make doing data analysis and visualization much
easier than it is in base R (you will be working with one such suite of
tools, the tidyverse/dplyr, in this class). If you are hoping to get a
job doing analytic work, if you want advanced education in a social
science field, if you want an introduction to coding/computer science,
or if you are just looking to better understand quantitative analysis, R
is a good place to begin.
To install R you will need about 500mb of free space on your computer (you may actually need less, but freeing up 500 mb ensures you will be able to download the data and packages used in the course). Perform the following series of steps:
Click the link to download R for your operating system
Select base
Select the download link at the top of the screen.
You should also install R Studio. Here is how you can do that:
Click the download button in the upper right hand corner
Scroll down and select RStudio Desktop. Select download RStudio.
Scroll down and select the appropriate installation for your
operating system.
As noted in the introduction, this class will use Codecademy to aid in learning basic coding skills in R and the basic R workflow. You will need a Codecademy account to retain your progress between sessions. You can register for one here Another tool you will use throughout the class and one you will also be using in your Codecademy work is a suite of R packages called the Tidyverse. The Tidyverse is designed to make working with data easier and is commonly used in many organizations (we use it at Mathematica, for example). There are eight major packages in the Tidyverse as well as many other specialized packages, but the following are important for this class:
%>%) which you can make use of for many of your data
cleaning tasks in this class (although R now has a built-in pipe:
|>).To install the Tidyverse, open R or RStudio and type
install.packages("tidyverse") into the console.
Finally, you will use R Markdown when creating your projects. R Markdown allows you to write code and text into the same document and have it rendered nicely as a word document, PDF, or html file. I used R Markdown to write this syllabus. Using R Markdown makes integrating visualizations, code, and analyses into your reports simple.
That’s it. You’re all set up!
While you are in class you can ask me questions and, of course, feel free to email me and other members of the #DegreesNYC team as you need support, but there are other things you can do to get answers before asking me or a classmate for help.
?functionname or help(functionname) into the
console and a help file will pop up (in the bottom right corner of your
screen if you are using R Studio) that has useful information about the
function.?paste
vignette("dplyr")
Once you’ve installed R and set up R studio, open R studio, open a new script, and copy and paste the code below to install the exercises that accompany this course.
# Try to install the swirl package
try(install.packages("swirl", repos = "https://cloud.r-project.org/"))
# Uninstall the DegreesNYC_RCourse
try(swirl::uninstall_course("DegreesNYC_RCourse"), silent = T)
# Install the DegreesNYC_RCourse
try(swirl::install_course_github("R-Gilliard-Jr", "DegreesNYC_RCourse", branch = "main"))
There are many ways to output values in R so that you can see them. One of the easiest is just typing the value into the console.
4
## [1] 4
"Hello"
## [1] "Hello"
Another way that is often used is to wrap the item you want to show
in the print() function.
print(4)
## [1] 4
print("Hello")
## [1] "Hello"
Variables in R are convenient ways to keep track of values. There are times when you will want to perform an operation, save the value, perform some intermediate steps, and then use the original value that you saved. Let’s work with the example of addition. You could do the following to add 4 twice:
4 + 4
## [1] 8
This is perfectly fine. But you could also use variables to add 4 twice like this:
x <- 4
x + x
## [1] 8
Variables allow you to work with values dynamically and without the
struggle of trying to remember all of the values that you are working
with individually. Keeping the math theme, suppose you wanted to
implement the quadratic formula which is written:
And you are provided the following values:
You could do this directly with numbers, as in
(-4 + sqrt(4^2 - 4*2*2))/(2*2)
## [1] -1
Or it could be done using variables, as in
a = 2
b = 4
2 -> c
(-b + sqrt(b^2 - 4*a*c))/(2*a)
## [1] -1
The advantage of the second option is that if the numbers you would like to enter into formula change, all you have to do is change them once, when assigning them, and you can use the formula again. In the first case, you would need to go through and make sure you’ve got all of the numbers entered into their correct locations carefully. The true power of variables will not be seen until you begin to work with for loops, but, for now, think of them as useful ways to store information.
In the examples above I created four different variables: x, a, b, and c. I assigned values to those variables in three different ways: <-, =, ->. You can use any of these assignment operators to assign a value to a variable. To assign a value with the equal sign or the left-arrow the variable name should go the left-hand side of the operator and the value that you want to assign should be on the right-hand side.
x <- 2
print(paste("x equals:", x))
## [1] "x equals: 2"
y = 2023
print(paste("y equals:", y))
## [1] "y equals: 2023"
There is also the right-arrow operator. If using this to assign a value to a variable, the value should be on the left-hand side and the name of the variable should be on the right-hand side.
"Hello" -> z
print(paste("z equals:", z))
## [1] "z equals: Hello"
There is nothing wrong with using any of the assignment operators so
you should go with the one you are most comfortable with, especially as
you learn. The industry standard, however, is to use the left-arrow
unless there is good reason. This prevents confusion when you are trying
to do a check for equality (==) and may be easier to read
than the right-arrow. I will use the left-arrow exclusively throughout
the remainder of this course.
There are 6 basic data types:
In this class you will focus on the logical, numeric, and character data types. These are the three most common types. See the helpful links for more information about the types that you will not cover.
Data with class logical can only take on the values of true or false.
Another name for this type of data is Boolean. In R, TRUE
and FALSE are equivalent to 1 and 0 respectively. That
means you can do things like add and subtract logical data even though
they are not technically numeric.
x <- TRUE
y <- FALSE
z <- TRUE
class(x)
## [1] "logical"
x + y
## [1] 1
x + z
## [1] 2
x - z
## [1] 0
The numeric data type holds all real numbers. This means any number, including those with decimals and negatives, but excluding imaginary numbers. We have been working with real numbers throughout this lesson. Examples of real numbers include 4, -25, 1.33333, and pi.
class(4)
## [1] "numeric"
You might wonder how R can tell the difference between a numeric 4
and an integer 4. Mathematically, after all, 4 is both an integer and a
real number. The way to pass an explicitly integer value to R is to
append an L to the end of the number. For example:
class(4L)
## [1] "integer"
class(1040L)
## [1] "integer"
Appending an L will not convert a number which has a
decimal into an integer:
class(1.33L)
## [1] "numeric"
Finally, there are other functions that can convert from numeric to integer for you (and indeed convert between any of the types):
x <- 4
class(x)
## [1] "numeric"
x <- as.integer(x)
class(x)
## [1] "integer"
The character data type holds strings. Strings contain a series of characters. Here are some examples of character data:
class("Hello")
## [1] "character"
class("1234")
## [1] "character"
You will notice that even though the second string (“1234”) contains only numbers R considers it of class character. This is important to remember because character variables and numeric vectors cannot be interacted with in the same ways:
try("1" + "2")
## Error in "1" + "2" : non-numeric argument to binary operator
1 + 2
## [1] 3
The first line of code throws an error, telling us that “1” and “2”
are not numeric and therefore cannot be added. The second line of code
returns what you expect, 3. Folks coming from other programming
languages should note that string concatenation cannot be done with the
+ operator in R (although you could create an operator that
does this yourself).
`%$$%` <- function(lhs, rhs) {
out <- paste0(lhs, rhs)
return(out)
}
"Hel" %$$% "lo"
## [1] "Hello"
"How" %$$% " are" %$$% " you?"
## [1] "How are you?"
R is designed with statistical analysis in mind. Thus it is easy to do math in R. Most of the mathematical operators are intuitive, but there are some which you may not be familiar with.
Addition, subtraction, multiplication, and division are all straightforward.
2 + 2
## [1] 4
2 - 2
## [1] 0
2 * 2
## [1] 4
2/2
## [1] 1
To exponentiate a number, use either ^ or **
3^2
## [1] 9
3**2
## [1] 9
There are also operators for the modulo of numbers and for integer division. The modulo returns the remainder when dividing two numbers. For example 3 modulo 2 is 1. The modulo operator is %%.
3 %% 2
## [1] 1
6 %% 4
## [1] 2
Integer division returns the integer portion of the result when dividing two numbers. For example 5 integer divided by 2 is 2. The integer division operator is %/%.
5 %/% 2
## [1] 2
7 %/% 6
## [1] 1
You are also likely familiar with most of the logical operators.
These are things in math which do not return a number, but rather return
a value of TRUE or FALSE. Greater than
(>), less than (<), greater than or
equal to (>=), less than or equal to
(<=), equal to (==), and not equal to
(!=) are all available in R. Note that you must use 2 equal
signs in R when you want to equate two things. One equal sign, as we
discussed earlier, is for assigning values to variables.
2 < 3
## [1] TRUE
3 > 2
## [1] TRUE
3 == 2
## [1] FALSE
3 != 2
## [1] TRUE
3 <= 2
## [1] FALSE
3 >= 2
## [1] TRUE
There are also operators for or (|),
and (&), and not
(!).
numlist <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
# Or is one, the other, or both
numlist[numlist < 3 | numlist > 7]
## [1] 1 2 8 9
# And is all
numlist[numlist < 9 & numlist > 7]
## [1] 8
# Not negates whatever follows
numlist[!(numlist >= 5)]
## [1] 1 2 3 4
One of R’s strengths, that it is open source, is also what makes it difficult to learn. There are sometimes many ways to do the same thing with solutions coming from base R, as well as many different packages. For example, suppose you load the built-in R data set mtcars and want to view only cars with mpg > 20:
# Print data set
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Select cars with greater than 20 mpg using base R
mtcars[mtcars$mpg > 20, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Select cars with greater than 20 mpg using dplyr
filter(mtcars, mpg > 20)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
You can see these two solutions are equivalent. Industry standards emerge for precisely this reason–to standardize workflows throughout the industry so that colleagues, collaborators, critics, and competitors can understand each others’ code more easily. dplyr has become the industry standard for much of data analysis in R.
dplyr provides a set of tools that make data cleaning and transformation more intuitive. You can read the package’s vignette for more information. This lesson will explain the 8 functions outlined in the vignette, using the mtcars data set. I will also talk about the pipe and why it is a useful tool for data analysis and creating legible code.
There are eight basic dplyr functions (and many more that are available for more specialized operations), but one of the most useful things in the dplyr package comes from a different package called magrittr: the %>% (pipe) operator.
The pipe operator takes the result on the current line and inserts it into the first argument of the following line. For example 1+1 %>% sum(2) = 4.
1+1 %>%
sum(2)
## [1] 4
The pipe operator makes workflows cleaner while still being easy to follow by removing the need for many intermediate saving steps. To illustrate, here is how one could multiply mpg by 2 and divide cyl by 3 in both base R and dplyr.
# In base R
# First duplicate the data set
mtcars2 <- mtcars
# Then multiply mpg by 2
mtcars2$mpg <- mtcars$mpg * 2
# Then divide cyl by 3
mtcars2$cyl <- mtcars$cyl/3
print(mtcars2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 42.0 2.000000 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 42.0 2.000000 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 45.6 1.333333 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 42.8 2.000000 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 37.4 2.666667 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 36.2 2.000000 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 28.6 2.666667 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 48.8 1.333333 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 45.6 1.333333 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 38.4 2.000000 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 35.6 2.000000 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 32.8 2.666667 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 34.6 2.666667 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 30.4 2.666667 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 20.8 2.666667 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 20.8 2.666667 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 29.4 2.666667 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 64.8 1.333333 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 60.8 1.333333 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 67.8 1.333333 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 43.0 1.333333 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 31.0 2.666667 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 30.4 2.666667 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 26.6 2.666667 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 38.4 2.666667 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 54.6 1.333333 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 52.0 1.333333 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 60.8 1.333333 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 31.6 2.666667 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 39.4 2.000000 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 30.0 2.666667 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 42.8 1.333333 121.0 109 4.11 2.780 18.60 1 1 4 2
# In dplyr
mtcars2 <- mtcars %>%
mutate(mpg = mpg * 2,
cyl = cyl/3) %>%
print()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 42.0 2.000000 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 42.0 2.000000 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 45.6 1.333333 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 42.8 2.000000 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 37.4 2.666667 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 36.2 2.000000 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 28.6 2.666667 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 48.8 1.333333 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 45.6 1.333333 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 38.4 2.000000 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 35.6 2.000000 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 32.8 2.666667 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 34.6 2.666667 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 30.4 2.666667 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 20.8 2.666667 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 20.8 2.666667 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 29.4 2.666667 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 64.8 1.333333 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 60.8 1.333333 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 67.8 1.333333 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 43.0 1.333333 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 31.0 2.666667 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 30.4 2.666667 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 26.6 2.666667 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 38.4 2.666667 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 54.6 1.333333 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 52.0 1.333333 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 60.8 1.333333 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 31.6 2.666667 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 39.4 2.000000 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 30.0 2.666667 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 42.8 1.333333 121.0 109 4.11 2.780 18.60 1 1 4 2
Again, the data sets are identical in the end, but in the dplyr case only one assignment operator was necessary. In base R, three were used. Also, in dplyr the data set mtcars is only referenced once and mtcars2 is only referenced when it is assigned. In base R, both data sets are referenced multiple times. The time savings and clarity that can come from using the pipe are not readily apparent from these simple examples, but as your work becomes more complex you will notice incorporating the pipe become more and more valuable.
Two additional notes: 1) For folks who are coming from other oriented
programming languages it might be useful to think of the pipe as similar
to method chaining. The two are not identical, but the comparison may be
useful for understanding why the pipe is worthwhile. 2) More recent
versions of R have a built-in pipe which does not have to be loaded from
magrittr |>. You can turn that on by going to Tools >
Global Options > Code and selecting “Use native pipe operator”.
dplyr’s filter() function allows you to select rows
meeting certain criteria. Suppose you are using the mtcars data set and
only want to see cars with exactly 6 cylinders. Then with dplyr:
mtcars %>%
filter(cyl == 6)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Or suppose you only wanted cars with 4 or more forward gears. Then
mtcars %>%
filter(gear >= 4)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Combine filter() with logical operators and other
functions for more complex selection of observations. Like in this case
where the data is subset to only cars made by Mazda:
mtcars %>%
mutate(make = rownames(mtcars)) %>%
filter(grepl("Mazda", make))
## mpg cyl disp hp drat wt qsec vs am gear carb make
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 Mazda RX4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 Mazda RX4 Wag
slice() allows selecting rows by index. For example, to
select rows 1, 3, and 5 of mtcars:
mtcars %>%
slice(c(1, 3, 5))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Or to select all odd rows:
mtcars %>%
slice(seq(1, nrow(mtcars), by = 2))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
There are also helper functions to do things like select the first and last rows quickly:
# First 5 rows
slice_head(mtcars, n = 5)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Last 5 rows
slice_tail(mtcars, n = 5)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
arrange() sorts data. You provide a column and, by
default, it sorts the rows by that column in ascending order. Compare
the following:
# Not arranged
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Data arranged by mpg
mtcars %>%
arrange(mpg)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
In the first case, the mtcars data set is in some arbitrary order.
After using arrange(), the rows are ordered from lowest to
highest mpg. Wrapping the target columns in desc() will
arrange rows in descending order:
mtcars %>%
arrange(desc(mpg))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Providing multiple columns to arrange will break ties on using succeeding columns:
mtcars %>%
arrange(mpg)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
mtcars %>%
arrange(mpg, disp)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
In the second case, Lincoln Continental comes before Cadillac
Fleetwood, because Lincoln Continental has the lower disp. Again,
desc() could be applied. In this case, Cadillac Fleetwood
comes first:
mtcars %>%
arrange(mpg, desc(disp))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
select() is among the most common dplyr verbs. It has a
simple, but important function: selecting which variables should be kept
in a data frame. For example, if you only want to keep the mpg variable
from the mtcars data frame, you can do the following:
mtcars %>%
select(mpg)
## mpg
## Mazda RX4 21.0
## Mazda RX4 Wag 21.0
## Datsun 710 22.8
## Hornet 4 Drive 21.4
## Hornet Sportabout 18.7
## Valiant 18.1
## Duster 360 14.3
## Merc 240D 24.4
## Merc 230 22.8
## Merc 280 19.2
## Merc 280C 17.8
## Merc 450SE 16.4
## Merc 450SL 17.3
## Merc 450SLC 15.2
## Cadillac Fleetwood 10.4
## Lincoln Continental 10.4
## Chrysler Imperial 14.7
## Fiat 128 32.4
## Honda Civic 30.4
## Toyota Corolla 33.9
## Toyota Corona 21.5
## Dodge Challenger 15.5
## AMC Javelin 15.2
## Camaro Z28 13.3
## Pontiac Firebird 19.2
## Fiat X1-9 27.3
## Porsche 914-2 26.0
## Lotus Europa 30.4
## Ford Pantera L 15.8
## Ferrari Dino 19.7
## Maserati Bora 15.0
## Volvo 142E 21.4
If instead, you’d like to keep multiple variables–say cylinders, gears, and carburetors:
mtcars %>%
select(mpg, gear, carb)
## mpg gear carb
## Mazda RX4 21.0 4 4
## Mazda RX4 Wag 21.0 4 4
## Datsun 710 22.8 4 1
## Hornet 4 Drive 21.4 3 1
## Hornet Sportabout 18.7 3 2
## Valiant 18.1 3 1
## Duster 360 14.3 3 4
## Merc 240D 24.4 4 2
## Merc 230 22.8 4 2
## Merc 280 19.2 4 4
## Merc 280C 17.8 4 4
## Merc 450SE 16.4 3 3
## Merc 450SL 17.3 3 3
## Merc 450SLC 15.2 3 3
## Cadillac Fleetwood 10.4 3 4
## Lincoln Continental 10.4 3 4
## Chrysler Imperial 14.7 3 4
## Fiat 128 32.4 4 1
## Honda Civic 30.4 4 2
## Toyota Corolla 33.9 4 1
## Toyota Corona 21.5 3 1
## Dodge Challenger 15.5 3 2
## AMC Javelin 15.2 3 2
## Camaro Z28 13.3 3 4
## Pontiac Firebird 19.2 3 2
## Fiat X1-9 27.3 4 1
## Porsche 914-2 26.0 5 2
## Lotus Europa 30.4 5 2
## Ford Pantera L 15.8 5 4
## Ferrari Dino 19.7 5 6
## Maserati Bora 15.0 5 8
## Volvo 142E 21.4 4 2
mtcars %>%
select(c("mpg", "gear", "carb"))
## mpg gear carb
## Mazda RX4 21.0 4 4
## Mazda RX4 Wag 21.0 4 4
## Datsun 710 22.8 4 1
## Hornet 4 Drive 21.4 3 1
## Hornet Sportabout 18.7 3 2
## Valiant 18.1 3 1
## Duster 360 14.3 3 4
## Merc 240D 24.4 4 2
## Merc 230 22.8 4 2
## Merc 280 19.2 4 4
## Merc 280C 17.8 4 4
## Merc 450SE 16.4 3 3
## Merc 450SL 17.3 3 3
## Merc 450SLC 15.2 3 3
## Cadillac Fleetwood 10.4 3 4
## Lincoln Continental 10.4 3 4
## Chrysler Imperial 14.7 3 4
## Fiat 128 32.4 4 1
## Honda Civic 30.4 4 2
## Toyota Corolla 33.9 4 1
## Toyota Corona 21.5 3 1
## Dodge Challenger 15.5 3 2
## AMC Javelin 15.2 3 2
## Camaro Z28 13.3 3 4
## Pontiac Firebird 19.2 3 2
## Fiat X1-9 27.3 4 1
## Porsche 914-2 26.0 5 2
## Lotus Europa 30.4 5 2
## Ford Pantera L 15.8 5 4
## Ferrari Dino 19.7 5 6
## Maserati Bora 15.0 5 8
## Volvo 142E 21.4 4 2
mtcars %>%
select(c(mpg, gear, carb))
## mpg gear carb
## Mazda RX4 21.0 4 4
## Mazda RX4 Wag 21.0 4 4
## Datsun 710 22.8 4 1
## Hornet 4 Drive 21.4 3 1
## Hornet Sportabout 18.7 3 2
## Valiant 18.1 3 1
## Duster 360 14.3 3 4
## Merc 240D 24.4 4 2
## Merc 230 22.8 4 2
## Merc 280 19.2 4 4
## Merc 280C 17.8 4 4
## Merc 450SE 16.4 3 3
## Merc 450SL 17.3 3 3
## Merc 450SLC 15.2 3 3
## Cadillac Fleetwood 10.4 3 4
## Lincoln Continental 10.4 3 4
## Chrysler Imperial 14.7 3 4
## Fiat 128 32.4 4 1
## Honda Civic 30.4 4 2
## Toyota Corolla 33.9 4 1
## Toyota Corona 21.5 3 1
## Dodge Challenger 15.5 3 2
## AMC Javelin 15.2 3 2
## Camaro Z28 13.3 3 4
## Pontiac Firebird 19.2 3 2
## Fiat X1-9 27.3 4 1
## Porsche 914-2 26.0 5 2
## Lotus Europa 30.4 5 2
## Ford Pantera L 15.8 5 4
## Ferrari Dino 19.7 5 6
## Maserati Bora 15.0 5 8
## Volvo 142E 21.4 4 2
As shown above, columns can be provided as a comma separated list, a character vector, or a vector of objects. This is especially useful if, instead, you want to remove some variables from the data frame:
mtcars %>%
select(-c("mpg", "gear", "carb"))
## cyl disp hp drat wt qsec vs am
## Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1
## Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 0
## Valiant 6 225.0 105 2.76 3.460 20.22 1 0
## Duster 360 8 360.0 245 3.21 3.570 15.84 0 0
## Merc 240D 4 146.7 62 3.69 3.190 20.00 1 0
## Merc 230 4 140.8 95 3.92 3.150 22.90 1 0
## Merc 280 6 167.6 123 3.92 3.440 18.30 1 0
## Merc 280C 6 167.6 123 3.92 3.440 18.90 1 0
## Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 0
## Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 0
## Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 0
## Cadillac Fleetwood 8 472.0 205 2.93 5.250 17.98 0 0
## Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 0
## Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 0
## Fiat 128 4 78.7 66 4.08 2.200 19.47 1 1
## Honda Civic 4 75.7 52 4.93 1.615 18.52 1 1
## Toyota Corolla 4 71.1 65 4.22 1.835 19.90 1 1
## Toyota Corona 4 120.1 97 3.70 2.465 20.01 1 0
## Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 0
## AMC Javelin 8 304.0 150 3.15 3.435 17.30 0 0
## Camaro Z28 8 350.0 245 3.73 3.840 15.41 0 0
## Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 0
## Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 1
## Porsche 914-2 4 120.3 91 4.43 2.140 16.70 0 1
## Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 1
## Ford Pantera L 8 351.0 264 4.22 3.170 14.50 0 1
## Ferrari Dino 6 145.0 175 3.62 2.770 15.50 0 1
## Maserati Bora 8 301.0 335 3.54 3.570 14.60 0 1
## Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 1
Now the variables of interest have been removed.
Sometimes column names are inconvenient to work with, not
descriptive, or generally do not meet your needs. rename()
can remedy this problem; use it when you want to rename a column in a
data frame:
mtcars %>%
rename(MilesPerGallon = mpg)
## MilesPerGallon cyl disp hp drat wt qsec vs am gear
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
## carb
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 2
## Valiant 1
## Duster 360 4
## Merc 240D 2
## Merc 230 2
## Merc 280 4
## Merc 280C 4
## Merc 450SE 3
## Merc 450SL 3
## Merc 450SLC 3
## Cadillac Fleetwood 4
## Lincoln Continental 4
## Chrysler Imperial 4
## Fiat 128 1
## Honda Civic 2
## Toyota Corolla 1
## Toyota Corona 1
## Dodge Challenger 2
## AMC Javelin 2
## Camaro Z28 4
## Pontiac Firebird 2
## Fiat X1-9 1
## Porsche 914-2 2
## Lotus Europa 2
## Ford Pantera L 4
## Ferrari Dino 6
## Maserati Bora 8
## Volvo 142E 2
The new variable name is entered on the left-hand side of the equation, the old variable name on the right hand side. Multiple columns can be renamed at once by separating each expression with a comma:
mtcars %>%
rename(MilesPerGallon = mpg,
Cylinders = cyl,
Carburetors = carb)
## MilesPerGallon Cylinders disp hp drat wt qsec vs am
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
## gear Carburetors
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
## Duster 360 3 4
## Merc 240D 4 2
## Merc 230 4 2
## Merc 280 4 4
## Merc 280C 4 4
## Merc 450SE 3 3
## Merc 450SL 3 3
## Merc 450SLC 3 3
## Cadillac Fleetwood 3 4
## Lincoln Continental 3 4
## Chrysler Imperial 3 4
## Fiat 128 4 1
## Honda Civic 4 2
## Toyota Corolla 4 1
## Toyota Corona 3 1
## Dodge Challenger 3 2
## AMC Javelin 3 2
## Camaro Z28 3 4
## Pontiac Firebird 3 2
## Fiat X1-9 4 1
## Porsche 914-2 5 2
## Lotus Europa 5 2
## Ford Pantera L 5 4
## Ferrari Dino 5 6
## Maserati Bora 5 8
## Volvo 142E 4 2
R has rules about how variables can be named (see
help(make.names) for more information). If you would really
like to (for example if you are creating a public-facing table), you can
create non-syntactic names by wrapping them in backticks:
mtcars %>%
rename(`Miles Per Gallon` = mpg)
## Miles Per Gallon cyl disp hp drat wt qsec vs am gear
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
## carb
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 2
## Valiant 1
## Duster 360 4
## Merc 240D 2
## Merc 230 2
## Merc 280 4
## Merc 280C 4
## Merc 450SE 3
## Merc 450SL 3
## Merc 450SLC 3
## Cadillac Fleetwood 4
## Lincoln Continental 4
## Chrysler Imperial 4
## Fiat 128 1
## Honda Civic 2
## Toyota Corolla 1
## Toyota Corona 1
## Dodge Challenger 2
## AMC Javelin 2
## Camaro Z28 4
## Pontiac Firebird 2
## Fiat X1-9 1
## Porsche 914-2 2
## Lotus Europa 2
## Ford Pantera L 4
## Ferrari Dino 6
## Maserati Bora 8
## Volvo 142E 2
mutate() allows you to modify and create columns in a
data frame. You use mutate by writing the column name you’d like to
modify/create on the left-hand side and a value on the right-hand side
of an expression. For example you can calculate the mpg to cyl ratio in
mtcars:
mtcars %>%
mutate(mpg_to_cyl = mpg/cyl)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## mpg_to_cyl
## Mazda RX4 3.500000
## Mazda RX4 Wag 3.500000
## Datsun 710 5.700000
## Hornet 4 Drive 3.566667
## Hornet Sportabout 2.337500
## Valiant 3.016667
## Duster 360 1.787500
## Merc 240D 6.100000
## Merc 230 5.700000
## Merc 280 3.200000
## Merc 280C 2.966667
## Merc 450SE 2.050000
## Merc 450SL 2.162500
## Merc 450SLC 1.900000
## Cadillac Fleetwood 1.300000
## Lincoln Continental 1.300000
## Chrysler Imperial 1.837500
## Fiat 128 8.100000
## Honda Civic 7.600000
## Toyota Corolla 8.475000
## Toyota Corona 5.375000
## Dodge Challenger 1.937500
## AMC Javelin 1.900000
## Camaro Z28 1.662500
## Pontiac Firebird 2.400000
## Fiat X1-9 6.825000
## Porsche 914-2 6.500000
## Lotus Europa 7.600000
## Ford Pantera L 1.975000
## Ferrari Dino 3.283333
## Maserati Bora 1.875000
## Volvo 142E 5.350000
As with rename(), if you’d like to manipulate multiple
variables in one call to mutate() you can do that by
separating expressions with a comma:
mtcars %>%
mutate(mpg_to_cyl = mpg/cyl,
mpg_to_carb = mpg/carb,
carb_to_cyl = mpg_to_cyl/mpg_to_carb)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## mpg_to_cyl mpg_to_carb carb_to_cyl
## Mazda RX4 3.500000 5.250000 0.6666667
## Mazda RX4 Wag 3.500000 5.250000 0.6666667
## Datsun 710 5.700000 22.800000 0.2500000
## Hornet 4 Drive 3.566667 21.400000 0.1666667
## Hornet Sportabout 2.337500 9.350000 0.2500000
## Valiant 3.016667 18.100000 0.1666667
## Duster 360 1.787500 3.575000 0.5000000
## Merc 240D 6.100000 12.200000 0.5000000
## Merc 230 5.700000 11.400000 0.5000000
## Merc 280 3.200000 4.800000 0.6666667
## Merc 280C 2.966667 4.450000 0.6666667
## Merc 450SE 2.050000 5.466667 0.3750000
## Merc 450SL 2.162500 5.766667 0.3750000
## Merc 450SLC 1.900000 5.066667 0.3750000
## Cadillac Fleetwood 1.300000 2.600000 0.5000000
## Lincoln Continental 1.300000 2.600000 0.5000000
## Chrysler Imperial 1.837500 3.675000 0.5000000
## Fiat 128 8.100000 32.400000 0.2500000
## Honda Civic 7.600000 15.200000 0.5000000
## Toyota Corolla 8.475000 33.900000 0.2500000
## Toyota Corona 5.375000 21.500000 0.2500000
## Dodge Challenger 1.937500 7.750000 0.2500000
## AMC Javelin 1.900000 7.600000 0.2500000
## Camaro Z28 1.662500 3.325000 0.5000000
## Pontiac Firebird 2.400000 9.600000 0.2500000
## Fiat X1-9 6.825000 27.300000 0.2500000
## Porsche 914-2 6.500000 13.000000 0.5000000
## Lotus Europa 7.600000 15.200000 0.5000000
## Ford Pantera L 1.975000 3.950000 0.5000000
## Ferrari Dino 3.283333 3.283333 1.0000000
## Maserati Bora 1.875000 1.875000 1.0000000
## Volvo 142E 5.350000 10.700000 0.5000000
Sometimes you want to apply the same transformation to multiple
columns at once. There are several ways to do this using dplyr, but a
common one is the function across(). across()
is especially useful when summarizing, but can also be used in
conjunction with mutate. For example, suppose you wanted to calculate
the mean of cyl, mpg, and carb:
mtcars %>%
mutate(across(c(cyl, mpg, carb), mean, na.rm = T)) %>%
select(mpg, cyl, carb)
## mpg cyl carb
## Mazda RX4 20.09062 6.1875 2.8125
## Mazda RX4 Wag 20.09062 6.1875 2.8125
## Datsun 710 20.09062 6.1875 2.8125
## Hornet 4 Drive 20.09062 6.1875 2.8125
## Hornet Sportabout 20.09062 6.1875 2.8125
## Valiant 20.09062 6.1875 2.8125
## Duster 360 20.09062 6.1875 2.8125
## Merc 240D 20.09062 6.1875 2.8125
## Merc 230 20.09062 6.1875 2.8125
## Merc 280 20.09062 6.1875 2.8125
## Merc 280C 20.09062 6.1875 2.8125
## Merc 450SE 20.09062 6.1875 2.8125
## Merc 450SL 20.09062 6.1875 2.8125
## Merc 450SLC 20.09062 6.1875 2.8125
## Cadillac Fleetwood 20.09062 6.1875 2.8125
## Lincoln Continental 20.09062 6.1875 2.8125
## Chrysler Imperial 20.09062 6.1875 2.8125
## Fiat 128 20.09062 6.1875 2.8125
## Honda Civic 20.09062 6.1875 2.8125
## Toyota Corolla 20.09062 6.1875 2.8125
## Toyota Corona 20.09062 6.1875 2.8125
## Dodge Challenger 20.09062 6.1875 2.8125
## AMC Javelin 20.09062 6.1875 2.8125
## Camaro Z28 20.09062 6.1875 2.8125
## Pontiac Firebird 20.09062 6.1875 2.8125
## Fiat X1-9 20.09062 6.1875 2.8125
## Porsche 914-2 20.09062 6.1875 2.8125
## Lotus Europa 20.09062 6.1875 2.8125
## Ford Pantera L 20.09062 6.1875 2.8125
## Ferrari Dino 20.09062 6.1875 2.8125
## Maserati Bora 20.09062 6.1875 2.8125
## Volvo 142E 20.09062 6.1875 2.8125
This example is not particularly useful, as you would probably never want to imply that all car makes have the same mpg, cyl, and carb, but it illustrates the power of across.
relocate() allows you to move the location of columns in
a data frame. You provide a series of columns that you would like to
move and a column you would like to move them in front of (with
.before) or behind (with .after). For example,
to move the disp and hp columns before the mpg column you could do the
following:
mtcars %>%
relocate(disp, hp, .before = mpg)
## disp hp mpg cyl drat wt qsec vs am gear carb
## Mazda RX4 160.0 110 21.0 6 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 160.0 110 21.0 6 3.90 2.875 17.02 0 1 4 4
## Datsun 710 108.0 93 22.8 4 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 258.0 110 21.4 6 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 360.0 175 18.7 8 3.15 3.440 17.02 0 0 3 2
## Valiant 225.0 105 18.1 6 2.76 3.460 20.22 1 0 3 1
## Duster 360 360.0 245 14.3 8 3.21 3.570 15.84 0 0 3 4
## Merc 240D 146.7 62 24.4 4 3.69 3.190 20.00 1 0 4 2
## Merc 230 140.8 95 22.8 4 3.92 3.150 22.90 1 0 4 2
## Merc 280 167.6 123 19.2 6 3.92 3.440 18.30 1 0 4 4
## Merc 280C 167.6 123 17.8 6 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 275.8 180 16.4 8 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 275.8 180 17.3 8 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 275.8 180 15.2 8 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 472.0 205 10.4 8 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 460.0 215 10.4 8 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 440.0 230 14.7 8 3.23 5.345 17.42 0 0 3 4
## Fiat 128 78.7 66 32.4 4 4.08 2.200 19.47 1 1 4 1
## Honda Civic 75.7 52 30.4 4 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 71.1 65 33.9 4 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 120.1 97 21.5 4 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 318.0 150 15.5 8 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 304.0 150 15.2 8 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 350.0 245 13.3 8 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 400.0 175 19.2 8 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 79.0 66 27.3 4 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 120.3 91 26.0 4 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 95.1 113 30.4 4 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 351.0 264 15.8 8 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 145.0 175 19.7 6 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 301.0 335 15.0 8 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 121.0 109 21.4 4 4.11 2.780 18.60 1 1 4 2
or
mtcars %>%
relocate(mpg, cyl, .after = hp)
## disp hp mpg cyl drat wt qsec vs am gear carb
## Mazda RX4 160.0 110 21.0 6 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 160.0 110 21.0 6 3.90 2.875 17.02 0 1 4 4
## Datsun 710 108.0 93 22.8 4 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 258.0 110 21.4 6 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 360.0 175 18.7 8 3.15 3.440 17.02 0 0 3 2
## Valiant 225.0 105 18.1 6 2.76 3.460 20.22 1 0 3 1
## Duster 360 360.0 245 14.3 8 3.21 3.570 15.84 0 0 3 4
## Merc 240D 146.7 62 24.4 4 3.69 3.190 20.00 1 0 4 2
## Merc 230 140.8 95 22.8 4 3.92 3.150 22.90 1 0 4 2
## Merc 280 167.6 123 19.2 6 3.92 3.440 18.30 1 0 4 4
## Merc 280C 167.6 123 17.8 6 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 275.8 180 16.4 8 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 275.8 180 17.3 8 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 275.8 180 15.2 8 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 472.0 205 10.4 8 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 460.0 215 10.4 8 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 440.0 230 14.7 8 3.23 5.345 17.42 0 0 3 4
## Fiat 128 78.7 66 32.4 4 4.08 2.200 19.47 1 1 4 1
## Honda Civic 75.7 52 30.4 4 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 71.1 65 33.9 4 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 120.1 97 21.5 4 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 318.0 150 15.5 8 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 304.0 150 15.2 8 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 350.0 245 13.3 8 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 400.0 175 19.2 8 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 79.0 66 27.3 4 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 120.3 91 26.0 4 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 95.1 113 30.4 4 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 351.0 264 15.8 8 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 145.0 175 19.7 6 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 301.0 335 15.0 8 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 121.0 109 21.4 4 4.11 2.780 18.60 1 1 4 2
relocate() is especially useful when outputting tables
(where the order of variables is important).
summarize() (or summarise()) collapses data
given a set of grouping variables created by group_by() or
other means. The most common use of summarize() is for
calculating summary statistics:
# Group mtcars by mpg and calculate the mean cylinders and max carburetors within those groups.
mtcars %>%
group_by(mpg) %>%
summarize(cyl = mean(cyl),
carb = max(carb))
## # A tibble: 25 × 3
## mpg cyl carb
## <dbl> <dbl> <dbl>
## 1 10.4 8 4
## 2 13.3 8 4
## 3 14.3 8 4
## 4 14.7 8 4
## 5 15 8 8
## 6 15.2 8 3
## 7 15.5 8 2
## 8 15.8 8 4
## 9 16.4 8 3
## 10 17.3 8 3
## # … with 15 more rows
But since summarize() applies functions within groups,
it can be used for other things, such as grouping text:
# Create mock data
ids <- c(1, 1, 2, 2)
text <- c("Hi", "I'm", "Reggie", "G.")
text_data <- as.data.frame(list(id = ids, text = text))
head(text_data)
## id text
## 1 1 Hi
## 2 1 I'm
## 3 2 Reggie
## 4 2 G.
# Combine text by ID
text_data %>%
group_by(id) %>%
summarize(text = paste(text, collapse = " "))
## # A tibble: 2 × 2
## id text
## <dbl> <chr>
## 1 1 Hi I'm
## 2 2 Reggie G.
This can come in handy in a variety of contexts such as working with survey data or parsing a pdf.
In general, data sets that you come across while working or doing
research will be messy. They will contain unexpected values,
missingness, variables that are unclear, and other problems. To help
make the irregularities of data more apparent and to shape data into a
form that is easily analyzed, statisticians and programmers have created
a set of rules for data collectively giving rise to the idea of tidy
data. There are three important things to remember about tidy data (see
vignette("tidy-data"), from which these rules are
drawn):
As you begin to work with more sophisticated data sets, keep these rules in mind. Use the tools that you learn in this course and beyond to mold the data into a tidy data frame. Reshaping and massaging data may seem like time wasted working on something other than your analysis, but the time you spend organizing your data will be paid for by a more straightforward analytic process.